Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing
نویسندگان
چکیده
Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefficients (MFCC). MFCCs are based on filter bank algorithm whose filters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are determined by several parameters used in mel cepstral analysis. The aim of this work is to examine compatibility of MFCC measure with human perception for different values of parameters in the analysis. By analysing mel filter bank parameters it is found that filter bank with 24 bands, 220 mels bandwidth and band overlap coefficient equal and higher than one gives optimal spectral distortion (SD) distance measures. For this kind of mel filter bank, the difference between vowels can be recognised for fulllength mel cepstral SD RMS measure higher than 0.4 0.5 dB. Further on, we will show that usage of truncated mel cepstral vector (12 coefficients) is justified for speech recognition, but may be arguable for speaker recognition. We also analysed the impact of aliasing in cepstral domain on cepstral distortion measures. The results showed high correlation of SD distances calculated from aperiodic and periodic mel cepstrum, leading to the conclusion that the impact of aliasing is generally minor. There are rare exceptions where aliasing is present, and these were also analysed.
منابع مشابه
An Improved Speech Processing Strategy for Cochlear Implants Based on objective measures for predicting speech intelligibility
The purpose of this study was to improve the speech processing strategy for cochlear implants (CIs) A speech preprocessing algorithm is presented to improve the speech intelligibility in noise. The algorithm improves the intelligibility by optimally redistributing the speech energy over time and frequency for a perceptual distortion measure, the algorithm is more sensitive to transient regions....
متن کاملReducing the effects of linear channel distortion on continuous speech recognition
Linear channel compensation in speech recognition typically involves estimating an additive shift in the cepstral domain. This paper explores both Bayesian and maximum likelihood techniques to transform either the features or the model parameters. Experiments on the Macrophone corpus show error rate reductions over cepstral mean subtraction for short utterances.
متن کاملRobust distant speech recognition based on position dependent CMN using a novel multiple microphone processing technique
In a distant environment, channel distortion may drastically degrade speech recognition performances. In this paper, we propose a robust multiple microphone speech processing approach based on position dependent Cepstral Mean Normalization (CMN). In the training stage, the system measures the transmission characteristics according to the speaker positions from some grid points in the room and e...
متن کاملApplication of speech conversion to alaryngeal speech enhancement
Two existing speech conversion algorithms were modified and used to enhance alaryngeal speech. The modifications were aimed at reducing spectral distortion (bandwidth increase) in a vector-quantization (VQ) based system and the spectral discontinuity in a linear multivariate regression (LMR) based system. Spectral distortion was compensated for by formant enhancement using chirp z-transform and...
متن کاملOn the use of bandpass liftering in speech recognition
Alstract-In a template-based speech recognition system, distortion measures that compute the distance or dissimilarity between two spectral representations have a strong influence on the performance of the recognizer. Accordingly, extensive comparative studies have been conducted to determine good distortion measures for improved recognition accuracy. Previous studies have shown that the log li...
متن کامل